## 
##    1    2    3    4    5 
##    5  124 2615   24    3
## 
##    1    2    3    4    5 
##    3   92  864  476 1495
## 
##    Brick & Tile    Cinder Block Poured Concrete            Slab           Stone 
##             311            1244            1310              49              11 
##            Wood 
##               5
## 
## Asbestos Shingles  Asphalt Shingles           AsphShn      Brick Common 
##                44               420                 2                 6 
##        Brick Face      Cement Board      Cinder Black        Hard Board 
##                88               126                 2               442 
##  Imitation Stucco      Metal Siding           Plywood           PreCast 
##                 1               450               221                 1 
##             Stone            Stucco      Vinyl Siding     Wood Shingles 
##                 2                43              1026                56
## 
## FALSE 
##  2930
## 
## FALSE 
##  2930
## 
## FALSE 
##  2930
## 
## FALSE  TRUE 
##  2771   159

Introduction

The aim of this project….

This project used data from 1500 residential property sales in Ames, Iowa between 2006 and 2012. There are 82 explanatory variables in the data set, containing - nominal, ordinal, discrete, and continuous attributes. Continuous variables provide information about the multiple area dimensions of the house and property, such as the the size of the lot, garage among others. Discrete variables, on the other hand, quantify characteristics of the house/properties like the number of kitchens, baths, bedrooms, and parking spots. Nominal variables, generally, describe the multiple types of materials and locations, such name of the neighborhood or the type of foundations. Ordinal variables typically rate the condition and quality of multiple house characteristics and utilities.

Exploratory Data Analysis

We decided to keep this as a continuous variable as opposed to switching it to a factor. We did so because changing it to a factor would have lead to us dropping the “Very Poor” or “1” factor level as this level only has around 4 observations. By keeping the variable continuous, we are able to keep these observations and so better predict the home prices of homes that fall under this category.

Exploring Selected Home Characteristics in the Dataset

Sale Price graph

When it comes to lot area, this dataset has many outliers as shown above. We found that there were 127 outliers greater than the minimum outlier value of 17755. As these made visualization difficult, we temporarily removed them. After removing the outliers, we can see that homes have a somewhat normal distribution in terms of lot area near the median of 9436.5 square feet.

From Figure 3, we see that 1-story homes that were built in 1946 or later make up the bulk of our dataset, specifically 1079. This is over one-third of our total dataset which has 2930 observations. Please not that the graphs are interactive so move your cursor over the graph to see more details. Furthermore, we can also observe from Figure 4, that most homes were built within a 5 year time range of 2005.

Summary Statistics

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   3.000   3.511   4.000   5.000

Relationship Between Sale Price and Selected Characteristics

Some intro text

Neighborhood We can observe a large variation in sale price across across different neighborhoods. Even within neighborhood we also see variation. Investigating some housing characteristics may give us insight into the variation observed in price within neighborhoods.

In terms of overall quality, as expected price increases as overall quality increases.

As one would expect, the newer a home is, the higher its price, on average.

When looking at home type by sale price, we find that 2 story homes built in the year 1946 or later have the highest median home prices.

We see a non-linear relationship between kitchen quality and sale price. The higher the kitchen quality the higher the median sale price. From Figure 10, we can see that - as expected - there is a gradual positive relationship between lot area and sales price.

Hypothesis

Methodology

Analysis

Discussion

References